image manifold
99f6a934a7cf277f2eaece8e3ce619b2-Supplemental.pdf
We use a variety of evaluation metrics to diagnose the effect that training with instance selection has on the learned distribution. In all cases where a reference distribution is required we usethe original training distribution,and not the distribution produced after instance selection. Inception Score (IS) [24] evaluates samples by extracting class probabilities from an ImageNet pretrained Inceptionv3 classifier and measuring the distribution of outputs over all samples. Classification Accuracy Score (CAS) [23, 25] was introduced for evaluating the usefulness of conditional generativemodels for augmenting downstream tasks such as image classification. Model Params (M) Batch Size Retention Ratio(%) IS FID P R D C BigGAN 52.54 512 100 25.43 10.55 - - - FQ-BigGAN 52.54 512 100 25.96 9.67 - - - - The truncation trick isasimple and popular technique which isused toincrease thevisual fidelity of samples from a GAN at the expense of reduced diversity [2].
Reviews: Adaptive Density Estimation for Generative Models
Summary: The authors propose a hybrid method that combines VAEs with adversarial training and flow based models. In particular, they derive an explicit density function p(x) where the likelihood can be evaluated, the corresponding components p(x z) are more flexible than the standard VAE that utilizes diagonal Gaussians, and the generated samples have better quality than a standard VAE. The basic idea of the proposed model is that the VAE is defined between a latent space and an intermediate representation space, and then, the representation space is connected with the data space through an invertible non-linear flow. In general, I think the paper is quite well written, but on the same time I believe that there is a lot of compressed information, and the consequence is that in some parts it is not even clear what the authors want to say (see Clarity comments). The proposed idea of the paper seems quite interesting, but on the same time I have some doubts (see Quality comments).
- Personal > Opinion (0.38)
- Summary/Review (0.35)
The Manifold Hypothesis for Gradient-Based Explanations
Bordt, Sebastian, Upadhyay, Uddeshya, Akata, Zeynep, von Luxburg, Ulrike
When do gradient-based explanation algorithms provide perceptually-aligned explanations? We propose a criterion: the feature attributions need to be aligned with the tangent space of the data manifold. To provide evidence for this hypothesis, we introduce a framework based on variational autoencoders that allows to estimate and generate image manifolds. Through experiments across a range of different datasets -- MNIST, EMNIST, CIFAR10, X-ray pneumonia and Diabetic Retinopathy detection -- we demonstrate that the more a feature attribution is aligned with the tangent space of the data, the more perceptually-aligned it tends to be. We then show that the attributions provided by popular post-hoc methods such as Integrated Gradients and SmoothGrad are more strongly aligned with the data manifold than the raw gradient. Adversarial training also improves the alignment of model gradients with the data manifold. As a consequence, we suggest that explanation algorithms should actively strive to align their explanations with the data manifold. This is an extended version of a CVPR Workshop paper. Code is available at https://github.com/tml-tuebingen/explanations-manifold.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.34)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Your diffusion model secretly knows the dimension of the data manifold
Stanczuk, Jan, Batzolis, Georgios, Deveney, Teo, Schönlieb, Carola-Bibiane
In this work, we propose a novel framework for estimating the dimension of the data manifold using a trained diffusion model. A diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that, if the data concentrates around a manifold embedded in the high-dimensional ambient space, then as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, for small levels of corruption, the diffusion model provides us with access to an approximation of the normal bundle of the data manifold. This allows us to estimate the dimension of the tangent space, thus, the intrinsic dimension of the data manifold. To the best of our knowledge, our method is the first estimator of the data manifold dimension based on diffusion models and it outperforms well established statistical estimators in controlled experiments on both Euclidean and image data.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York (0.04)
Diffusion Models Generate Images Like Painters: an Analytical Theory of Outline First, Details Later
How do diffusion generative models convert pure noise into meaningful images? We argue that generation involves first committing to an outline, and then to finer and finer details. The corresponding reverse diffusion process can be modeled by dynamics on a (time-dependent) high-dimensional landscape full of Gaussian-like modes, which makes the following predictions: (i) individual trajectories tend to be very low-dimensional; (ii) scene elements that vary more within training data tend to emerge earlier; and (iii) early perturbations substantially change image content more often than late perturbations. We show that the behavior of a variety of trained unconditional and conditional diffusion models like Stable Diffusion is consistent with these predictions. Finally, we use our theory to search for the latent image manifold of diffusion models, and propose a new way to generate interpretable image variations. Our viewpoint suggests generation by GANs and diffusion models have unexpected similarities.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
RGI: robust GAN-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection
Mou, Shancong, Gu, Xiaoyi, Cao, Meng, Bai, Haoping, Huang, Ping, Shan, Jiulong, Shi, Jianjun
Generative adversarial networks (GANs), trained on a large-scale image dataset, can be a good approximator of the natural image manifold. GAN-inversion, using a pre-trained generator as a deep generative prior, is a promising tool for image restoration under corruptions. However, the performance of GAN-inversion can be limited by a lack of robustness to unknown gross corruptions, i.e., the restored image might easily deviate from the ground truth. In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown \textit{gross} corruptions, where a small fraction of pixels are completely corrupted. Under mild assumptions, we show that the restored image and the identified corrupted region mask converge asymptotically to the ground truth. Moreover, we extend RGI to Relaxed-RGI (R-RGI) for generator fine-tuning to mitigate the gap between the GAN learned manifold and the true image manifold while avoiding trivial overfitting to the corrupted input image, which further improves the image restoration and corrupted region mask identification performance. The proposed RGI/R-RGI method unifies two important applications with state-of-the-art (SOTA) performance: (i) mask-free semantic inpainting, where the corruptions are unknown missing regions, the restored background can be used to restore the missing content; (ii) unsupervised pixel-wise anomaly detection, where the corruptions are unknown anomalous regions, the retrieved mask can be used as the anomalous region's segmentation mask.
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning
Hamm, Keaton, Henscheid, Nick, Kang, Shujie
In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global and local techniques.
- North America > United States > Texas (0.04)
- North America > United States > Arizona (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine (0.68)
- Government (0.46)
Boomerang: Local sampling on image manifolds using diffusion models
Luzi, Lorenzo, Siahkoohi, Ali, Mayer, Paul M, Casco-Rodriguez, Josue, Baraniuk, Richard
Diffusion models can be viewed as mapping points in a high-dimensional latent space onto a low-dimensional learned manifold, typically an image manifold. The intermediate values between the latent space and image manifold can be interpreted as noisy images which are determined by the noise scheduling scheme employed during pre-training. We exploit this interpretation to introduce Boomerang, a local image manifold sampling approach using the dynamics of diffusion models. We call it Boomerang because we first add noise to an input image, moving it closer to the latent space, then bring it back to the image space through diffusion dynamics. We use this method to generate images which are similar, but nonidentical, to the original input images on the image manifold. We are able to set how close the generated image is to the original based on how much noise we add. Additionally, the generated images have a degree of stochasticity, allowing us to locally sample as many times as we want without repetition. We show three applications for which Boomerang can be used. First, we provide a framework for constructing privacy-preserving datasets having controllable degrees of anonymity. Second, we show how to use Boomerang for data augmentation while staying on the image manifold. Third, we introduce a framework for image super-resolution with 8x upsampling. Boomerang does not require any modification to the training of diffusion models and can be used with pretrained models on a single, inexpensive GPU.